Introduction

This report provides a detailed workflow of the project on Homestead Tax Exemption entitlement assisstance outreach, for the City of Philadelphia Office of Philly Stat 360 and Office of Information Technology. The aim of the project is design an algorithm-driven outreach campaign that can cost effectively identify homeowners who are likely to be eligible for the Homestead Tax Exemption but are not participating in the program. The project aims to allow our clients to understand where these properties are located, potential outreach strategies, and the associated costs and benefits.

These relevant properties who are identified as most likely eligible for the Homestead Exemption but not taking up the program, are also thought to be more likely to be subject to “tangled titles,” or family-rental arrangements that require an affidavit to waive need for a rental license.

Background on Property Tax in Philadelphia

Property tax in Philadelphia is 1.3998% of the property value, as assessed by the Office of Property Assessment,for the 2025 taxx year. This is made up of 0.6159% (City of Philadelphia) and 0.7839% (School District) The taxes are due March 31st yearly.

Background on Homestead Exemption

The Homestead Exemption reduces the taxable portion of a homeowner’s property assessment by up to $100,000, saving up to $1,399 on real estate taxes annually. The bill signed aimed to lessen the financial burden of new property assessments on Philadelphia homeowners, whose property values increased by an average of 31% after the city delayed the annual calculations for three years due to the pandemic. Eligibility for the Homestead Exemption is as follows: • you must own the property and use it as your primary residence • no age or income restrictions • Not used exclusively for business purposes or as rental units (a percentage is fine)

A homeowner is Ineligible if a homeowner is already enrolled in these alternative real estate tax relief/abatement programs: • Longtime Owner Occupants Program (LOOP), an income-based program for homeowners who experience a substantial increase in their property assessment. • 10-year residential tax abatement program, although one can only apply for Homestead Exemption after the abatement is over

Programs that can be used in conjunction with the homestead exemption include • Owner-Occupied Real Estate Tax Payment Agreement (OOPA) • Senior Citizen Real Estate Tax Freeze • Low-Income Real Estate Tax Freeze • Real Estate Tax Installment Plan • Tax Credits for Active-Duty Reserve and National Guard Members

Tangled Titles

An issue of concern that may result in a long-term resident not being able to claim for homestead exemption is tangled titles, which occur when a long-term resident effectively functions as a homeowner but lacks legal ownership of the property. This often happens when a family member who owned the property passes away, and the necessary legal processes to formalize the ownership transfer were never completed, leaving the resident ineligible for the exemption. However, Philadelphia has a conditional Homestead Exemption of three years for such cases while the legal transfer of ownership is resolved.

Significance of Outreach

Currently, no focused or strategic efforts are being carried out by to identify and reach homeowners who is not enrolled in the Homestead Exemption. Through an accurate identification of eligible homeowners, a cost-effective and efficient targeted outreach will be possible, enabling these homeowners to be made aware of and receive support in keeping their home.

Data

Dataset Overview

The primary dataset used is the Property and Assessment History publicly available for download on OpenDataPhilly. Six relevant datasets are merged with this primary dataset with common identifying keys such as the parcel number in order to include useful predictor variabels in the model predicting for homeowners most likely eligible but not currently enrolled in the Homestead Exemption.

Exploratory Analysis

Property and Assessment History

Every observation in the Property and Assessment History dataset is one property in Philadelphia, with a total of 584,049 properties and 79 features. As this dataset is updated daily, the one used for this project is updated as of 31 January 2025.

Creation of dependent variable

There is a column ‘homestead_exemption’ within this dataset which indicates the taxable portion amount removed from the property assessment of the house. It should be noted that there are 14 properties that had a homestead exemption larger then $100,000, the maximum possible amount, which is suspected to be a clerical error and has been flagged to the PhillyStat360 team. The dependent variable for the model is derived from this feature by creating a binary variable on whether or not the property is currently enrolled in the homestead exemption program, indicated by a non-zero value. There are 246,853 properties with a homestead exemption.

properties <- fread("Data/opa_properties_public.csv")
filtered_properties <- properties %>%
  mutate(exemption = ifelse(homestead_exemption == 0, 0, 1))

Filtering to residential properties

filtered_properties <- filtered_properties %>% mutate(is_residential = ifelse(zoning %in% c(
  "RM1", "RM2", "RM3", "RM4",
  "RSA1", "RSA2", "RSA3", "RSA4", "RSA5", "RSA6", 
  "RSD1", "RSD2", "RSD3", 
  "RM1|RSA5", "RSD1|RSD3", "RSA5|RSA5",
  "RTA1", 
  "CMX1", "CMX2", "CMX2.5", "CMX3", "CMX4", "CMX5", "IRMX"), 1, 0))

# Look into exemption status by zoning code
exemptionbyzoning <- filtered_properties %>%
  group_by(zoning, exemption) %>%
  summarise(count = n(), .groups = "drop") %>%
  tidyr::pivot_wider(names_from = exemption, values_from = count, values_fill = list(count = 0)) %>%
  rename(No_Exemption = `0`, Exemption = `1`)

# Look into number of blank / NA zoning codes
num_blank_zoning <- properties %>%
  filter(zoning == "" | is.na(zoning) | str_trim(zoning) == "") %>%
  nrow()

print(num_blank_zoning)
## [1] 2464
residential_properties <- filtered_properties %>% filter(is_residential == 1)

Transform to geodata

properties_sf <- st_as_sf(residential_properties, wkt = "shape", crs = 2272)

Homestead Rate by Census Tract

census_tracts <- st_read("Data/phila_census1.gpkg")
## Reading layer `phila_census' from data source 
##   `C:\Users\14735\WPSDrive\376583023\WPS云盘\0_MCP\25spring\Smart Cities Practicum\Philly-Homeowners\Data\phila_census1.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 408 features and 31 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -75.28027 ymin: 39.867 xmax: -74.95576 ymax: 40.13799
## Geodetic CRS:  NAD83
census_tracts <- st_transform(census_tracts, 2272)
properties_tract <- st_join(properties_sf, census_tracts)

# Create tract summary
tract_summary <- properties_tract %>%
  group_by(GEOID) %>%
  summarise(
    total_properties = n(),
    homestead_count = sum(homestead_exemption > 0, na.rm = TRUE),
    pct_homestead = (homestead_count / total_properties) * 100,
    .groups = "drop"
  ) %>%
  st_drop_geometry()

# Create final enriched dataset
census_tracts_enriched <- census_tracts %>%
  left_join(tract_summary, by = "GEOID")

General exploration of variables

One major issue faces is the large number of NA values. Even for those features with a low number of NA values indicated in this table, further investigation reveals that there are many empty cells

#Number of NA values
print(residential_properties %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  tidyr::pivot_longer(cols = everything(), names_to = "Column", values_to = "NA_Count"), n = 100)
## # A tibble: 81 × 2
##    Column                        NA_Count
##    <chr>                            <int>
##  1 objectid                             0
##  2 assessment_date                     13
##  3 basements                            0
##  4 beginning_point                      0
##  5 book_and_page                        0
##  6 building_code                        0
##  7 building_code_description            0
##  8 category_code                        0
##  9 category_code_description            0
## 10 census_tract                        12
## 11 central_air                          0
## 12 cross_reference                 564194
## 13 date_exterior_condition         564194
## 14 depth                             3464
## 15 exempt_building                     13
## 16 exempt_land                         13
## 17 exterior_condition                   0
## 18 fireplaces                       76738
## 19 frontage                          3488
## 20 fuel                                 0
## 21 garage_spaces                    78050
## 22 garage_type                     530843
## 23 general_construction                 0
## 24 geographic_ward                      0
## 25 homestead_exemption                  0
## 26 house_extension                      0
## 27 house_number                         0
## 28 interior_condition                   0
## 29 location                             0
## 30 mailing_address_1                    0
## 31 mailing_address_2               564194
## 32 mailing_care_of                      0
## 33 mailing_city_state                   0
## 34 mailing_street                       0
## 35 mailing_zip                          0
## 36 market_value                        13
## 37 market_value_date               564194
## 38 number_of_bathrooms              76387
## 39 number_of_bedrooms               72099
## 40 number_of_rooms                 551851
## 41 number_stories                   64549
## 42 off_street_open                   6749
## 43 other_building                       0
## 44 owner_1                              0
## 45 owner_2                              0
## 46 parcel_number                        0
## 47 parcel_shape                         0
## 48 quality_grade                        0
## 49 recording_date                    3495
## 50 registry_number                      0
## 51 sale_date                         2163
## 52 sale_price                        2187
## 53 separate_utilities                   0
## 54 sewer                                0
## 55 site_type                       564194
## 56 state_code                           2
## 57 street_code                         12
## 58 street_designation                   0
## 59 street_direction                     0
## 60 street_name                          0
## 61 suffix                               0
## 62 taxable_building                    13
## 63 taxable_land                        13
## 64 topography                           0
## 65 total_area                         603
## 66 total_livable_area               39429
## 67 type_heater                          0
## 68 unfinished                      564194
## 69 unit                                 0
## 70 utility                         564194
## 71 view_type                            0
## 72 year_built                       39426
## 73 year_built_estimate                  0
## 74 zip_code                             0
## 75 zoning                               0
## 76 pin                                  0
## 77 building_code_new                    0
## 78 building_code_description_new        0
## 79 shape                                0
## 80 exemption                            0
## 81 is_residential                       0
# Transform data to WGS84 (required for leaflet)
census_tracts_wgs84 <- st_transform(census_tracts_enriched, 4326)

# Create interactive map
leaflet(census_tracts_wgs84) %>%
  addTiles() %>%  # Add OpenStreetMap base map
  addPolygons(
    fillColor = ~colorNumeric(
      palette = "viridis",
      domain = c(0, 100)
    )(pct_homestead),
    fillOpacity = 0.7,
    weight = 1,
    color = "white",
    popup = ~paste(
      "Census Tract:", GEOID, "<br>",
      "Homestead %:", round(pct_homestead, 1), "<br>",
      "Population Density:", round(pop_density, 0), "<br>",
      "Total Properties:", total_properties, "<br>",
      "Median Income:", scales::dollar(median_income)
    )
  ) %>%
  addLegend(
    position = "bottomright",
    pal = colorNumeric("viridis", domain = c(0, 100)),
    values = ~pct_homestead,
    title = "% Homestead Exemption",
    opacity = 0.7
  )
census_tracts_filtered <- census_tracts_enriched %>%
  filter(pop_density > 0 & total_properties >= 30)  # 30 properties minimum

census_tracts_invalid <- census_tracts_enriched %>%
  filter(pop_density == 0 | total_properties < 30)  # Include low property counts in "invalid"


# Transform both to WGS84
census_tracts_invalid_wgs84 <- st_transform(census_tracts_invalid, 4326)
census_tracts_filtered_wgs84 <- st_transform(census_tracts_filtered, 4326)

# Create map with both layers
leaflet() %>%
  addTiles() %>%
  # Add invalid tracts first (in gray)
  addPolygons(data = census_tracts_invalid_wgs84,
    fillColor = "gray",
    fillOpacity = 0.5,
    weight = 1,
    color = "white",
    popup = "No data available"
  ) %>%
  # Add valid tracts with your original styling
  addPolygons(data = census_tracts_filtered_wgs84,
    fillColor = ~colorNumeric(
      palette = "viridis",
      domain = c(0, 85)
    )(pct_homestead),
    fillOpacity = 0.7,
    weight = 1,
    color = "white",
    popup = ~paste(
    "Census Tract:", GEOID, "<br>",
    "Homestead %:", round(pct_homestead, 1), "%<br>",
    "Total Properties:", total_properties,
    ifelse(total_properties < 100, 
           "<br><i style='color:red'>Note: Low property count may affect reliability</i>", 
           "")
  )

  )%>%
  addLegend(
    position = "bottomright",
    pal = colorNumeric("viridis", domain = c(0, 85)),
    values = census_tracts_filtered_wgs84$pct_homestead,
    title = "% Homestead Exemption",
    opacity = 0.7,
    labFormat = labelFormat(suffix = "%")
  ) %>%
  # Add legend for gray areas
  addLegend(
    position = "bottomright",
    colors = "gray",
    labels = "No Data Available",
    opacity = 0.5
  )
# Create histogram of homestead exemption distribution
census_hist <- ggplot(census_tracts_enriched %>% 
       filter(pop_density > 0), # Filter out zero population tracts
       aes(x = pct_homestead)) +
  geom_histogram(
    binwidth = 5,
    fill = "#008d8a",
    color = "white"
  ) +
  labs(
    title = "Distribution of Homestead Exemption Rates Across Philadelphia Census Tracts",
    subtitle = "Excluding Zero Population Density Tracts",
    x = "Percentage of Properties with Homestead Exemption",
    y = "Number of Census Tracts"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 18, face = "bold"),
    plot.subtitle = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 14),
    legend.text = element_text(size = 14)
  ) +
  scale_x_continuous(breaks = seq(0, 100, by = 10))

census_hist

The distribution shows a roughly normal shape with most tracts clustered between 30-50% There’s a notable drop-off below 30% in the number of tracts The histogram shows relatively few tracts with rates below 20%

Therefore, census tracts with homestead exemption rates below 30% could be considered to have low enrollment and might warrant targeted outreach or investigation into barriers to participation, assuming they are primarily residential areas and not institutional/special use tracts.

There’s a notable drop-off below 30% in the number of tracts. Therefore, census tracts with homestead exemption rates below 30% could be considered to have low enrollment and might warrant targeted outreach or investigation into barriers to participation, assuming they are primarily residential areas.

homestead_pattern <- ggplot(census_tracts_enriched) +
  geom_sf(aes(fill = cut(pct_homestead, 
              breaks = c(0, 20, 30, 40, 50, 60, 100),
              labels = c("<20%", "20-30%", "30-40%", "40-50%", "50-60%", ">60%")))) +
  scale_fill_viridis_d(
    name = "Homestead\nExemption Rate",
    na.value = "gray80",
    guide = guide_legend(reverse = TRUE)
  ) +
  labs(
    title = "Homestead Exemption Rates Across Philadelphia",
    subtitle = "By Census Tract (Excluding Zero Population Areas)",
    caption = "Gray areas indicate zero population density tracts"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),
    plot.title = element_text(size = 18, face = "bold"),
    plot.subtitle = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_blank(),
    legend.text = element_text(size = 14)
  )
homestead_pattern

#ggsave("outputs/homestead-exemption-pattern.png", homestead_pattern, width = 10, height = 6)
# Basic summary statistics
summary(census_tracts_enriched$pop_density)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0   10522   18246   19616   27127   92575
# More detailed statistics
census_tracts_enriched %>%
  summarise(
    mean_density = mean(pop_density, na.rm = TRUE),
    median_density = median(pop_density, na.rm = TRUE),
    q1 = quantile(pop_density, 0.25, na.rm = TRUE),
    q3 = quantile(pop_density, 0.75, na.rm = TRUE)
  )
## Simple feature collection with 1 feature and 4 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 2660586 ymin: 204650.6 xmax: 2750109 ymax: 304965.3
## Projected CRS: NAD83 / Pennsylvania South (ftUS)
##   mean_density median_density       q1       q3                           geom
## 1     19615.79       18246.11 10521.61 27126.94 POLYGON ((2679429 207838.9,...
# Visual distribution
ggplot(census_tracts_enriched, aes(x = pop_density)) +
  geom_histogram(binwidth = 1000) +
  theme_minimal() +
  labs(title = "Distribution of Population Density in Philadelphia Census Tracts",
       x = "Population Density (per square mile)",
       y = "Count of Census Tracts")

ggplot(census_tracts_enriched, aes(x = pop_density)) +
  geom_histogram(binwidth = 1000) +
  theme_minimal() +
  labs(title = "Distribution of Population Density in Philadelphia Census Tracts",
       x = "Population Density (per square mile)",
       y = "Count of Census Tracts")

homestead_pattern <- ggplot(census_tracts_enriched %>% 
                           mutate(pct_homestead = case_when(
                             pop_density < 2000 | total_properties < 100 ~ NA_real_,
                             TRUE ~ pct_homestead))) +
  geom_sf(aes(fill = cut(pct_homestead, 
              breaks = c(0, 20, 30, 40, 50, 60, 100),
              labels = c("<20%", "20-30%", "30-40%", "40-50%", "50-60%", ">60%")))) +
  scale_fill_viridis_d(
    name = "Homestead\nExemption Rate",
    na.value = "gray80",
    guide = guide_legend(reverse = TRUE)
  ) +
  labs(
    title = "Homestead Exemption Rates Across Philadelphia",
    subtitle = "By Census Tract",
    caption = "Gray tracts indicate population density < 2,000 per sq. mile or fewer than 100 properties"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),
    plot.title = element_text(size = 18, face = "bold"),
    plot.subtitle = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_blank(),
    legend.text = element_text(size = 14)
  )
homestead_pattern

# Distribution of pct_homestead
summary(tract_summary$pct_homestead)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   28.08   41.50   41.73   54.92  100.00
# Map for low rates (under 30%)
ggplot(census_tracts_enriched) +
  geom_sf(aes(fill = ifelse(pop_density > 0 & pct_homestead < 30, 
              pct_homestead, NA))) +
  scale_fill_viridis_c(
    name = "% Homestead\nExemption\n(Under 30%)",
    na.value = "gray80",
    limits = c(0, 30),
    breaks = seq(0, 30, by = 5)
  ) +
  labs(
    title = "Low Homestead Exemption Rates in Philadelphia",
    subtitle = "Census Tracts Below 30% Enrollment",
    caption = "Gray areas: Zero population density or rates ≥ 30%"
  ) +
  map_theme

# Table for low rates
low_enrollment_tracts <- census_tracts_enriched %>%
  filter(pop_density > 0 & pct_homestead < 30) %>%
  select(GEOID, pct_homestead, pop_density) %>%
  arrange(pct_homestead)

low_enrollment_tracts %>%
  st_drop_geometry() %>% 
  arrange(pct_homestead) %>%
  kable(
    col.names = c("Census Tract", "% Homestead", "Population Density"),
    digits = 2,
    caption = "Census Tracts with Low Homestead Exemption Rates (<30%)"
  )
Census Tracts with Low Homestead Exemption Rates (<30%)
Census Tract % Homestead Population Density
42101008801 0.00 31288.99
42101012201 0.00 20308.08
42101012203 0.00 12216.50
42101036901 0.00 66.50
42101036902 0.00 18371.62
42101989300 0.00 145.65
42101012501 0.65 19593.83
42101000600 2.15 23041.85
42101008802 3.82 40358.85
42101000500 5.37 18466.26
42101015300 8.35 27884.65
42101015600 9.65 10032.78
42101014700 11.48 25395.42
42101014000 12.15 33879.19
42101016200 12.65 16491.54
42101000702 13.26 54828.50
42101009000 14.26 36339.49
42101016500 14.26 18703.31
42101000404 14.69 49405.57
42101010600 15.91 17551.67
42101010800 16.28 20039.58
42101016300 16.34 11835.92
42101014500 16.41 17870.04
42101010900 16.48 24584.09
42101016600 16.53 22256.69
42101016400 16.91 21458.61
42101014800 17.14 12169.05
42101000901 17.73 53169.00
42101008702 18.11 25905.00
42101017701 18.38 34113.71
42101015200 18.43 23128.35
42101037700 18.46 18800.68
42101013300 18.56 25325.92
42101016702 18.56 18781.08
42101013100 18.67 15041.99
42101037600 19.03 12327.66
42101017800 19.41 25766.40
42101020000 19.65 9370.00
42101000101 19.76 14330.64
42101014400 19.84 22719.23
42101017601 19.97 22499.74
42101013900 20.53 11893.19
42101000701 20.93 33310.62
42101020101 21.32 17117.37
42101009100 21.42 18026.35
42101029400 21.46 15360.37
42101017400 21.82 17969.72
42101010700 21.97 18285.44
42101013200 22.02 17949.73
42101006600 22.24 13505.86
42101014100 22.31 13739.18
42101013800 22.33 15763.50
42101006300 22.40 20581.10
42101000805 22.57 92575.12
42101015101 22.71 28989.00
42101011000 22.72 13900.65
42101029300 22.83 13460.26
42101014202 22.95 11438.46
42101000401 23.06 31951.13
42101024100 23.32 8896.72
42101017702 23.46 25018.11
42101036700 23.72 12813.19
42101020300 23.89 13067.52
42101016701 24.19 35668.68
42101000102 24.37 16883.65
42101018801 24.43 23895.47
42101002000 24.57 18106.87
42101000200 24.59 23376.36
42101016100 24.71 22414.27
42101015102 24.77 22941.91
42101003300 24.80 15304.04
42101016901 25.09 16946.58
42101014900 25.15 24675.81
42101038100 25.29 608.11
42101003100 25.36 32820.35
42101017500 25.97 24161.65
42101024600 26.04 10639.09
42101008701 26.16 33669.26
42101003200 26.49 22098.03
42101016902 26.53 18034.08
42101013701 26.66 17167.16
42101011100 26.72 8056.80
42101037800 26.74 1524.12
42101016800 26.94 17662.29
42101009200 27.30 17421.38
42101024500 27.51 15856.83
42101008602 27.52 22227.50
42101013702 27.81 33356.17
42101014300 27.83 8977.12
42101029900 27.91 19596.37
42101010500 28.00 16738.99
42101017900 28.16 24208.45
42101018802 28.32 43792.22
42101030000 28.44 23583.79
42101009400 28.55 27290.24
42101017602 28.60 26348.66
42101000902 28.80 45668.26
42101004101 28.82 34632.43
42101014201 28.96 29354.24
42101015700 29.17 13311.10
42101007700 29.41 14586.80
42101019200 29.49 34549.40
42101010300 29.64 24235.53
42101013500 29.68 26085.90
42101009500 29.71 30478.18
# Map for high rates (over 60%)
ggplot(census_tracts_enriched) +
  geom_sf(aes(fill = ifelse(pop_density > 0 & pct_homestead > 60, 
              pct_homestead, NA))) +
  scale_fill_viridis_c(
    name = "% Homestead\nExemption\n(Over 60%)",
    na.value = "gray80",
    limits = c(60, 82),
    breaks = seq(60, 80, by = 5)
  ) +
  labs(
    title = "High Homestead Exemption Rates in Philadelphia",
    subtitle = "Census Tracts Above 60% Enrollment",
    caption = "Gray areas: Zero population density or rates ≤ 60%"
  ) +
  map_theme

# Table for high rates
low_enrollment_tracts <- census_tracts_enriched %>%
  filter(pop_density > 0 & pct_homestead < 30) %>%
  select(GEOID, pct_homestead, pop_density) %>%
  arrange(pct_homestead)

low_enrollment_tracts %>%
  st_drop_geometry() %>% 
  arrange(pct_homestead) %>%
  kable(
    col.names = c("Census Tract", "% Homestead", "Population Density"),
    digits = 2,
    caption = "Census Tracts with Low Homestead Exemption Rates (<30%)"
  )
Census Tracts with Low Homestead Exemption Rates (<30%)
Census Tract % Homestead Population Density
42101008801 0.00 31288.99
42101012201 0.00 20308.08
42101012203 0.00 12216.50
42101036901 0.00 66.50
42101036902 0.00 18371.62
42101989300 0.00 145.65
42101012501 0.65 19593.83
42101000600 2.15 23041.85
42101008802 3.82 40358.85
42101000500 5.37 18466.26
42101015300 8.35 27884.65
42101015600 9.65 10032.78
42101014700 11.48 25395.42
42101014000 12.15 33879.19
42101016200 12.65 16491.54
42101000702 13.26 54828.50
42101009000 14.26 36339.49
42101016500 14.26 18703.31
42101000404 14.69 49405.57
42101010600 15.91 17551.67
42101010800 16.28 20039.58
42101016300 16.34 11835.92
42101014500 16.41 17870.04
42101010900 16.48 24584.09
42101016600 16.53 22256.69
42101016400 16.91 21458.61
42101014800 17.14 12169.05
42101000901 17.73 53169.00
42101008702 18.11 25905.00
42101017701 18.38 34113.71
42101015200 18.43 23128.35
42101037700 18.46 18800.68
42101013300 18.56 25325.92
42101016702 18.56 18781.08
42101013100 18.67 15041.99
42101037600 19.03 12327.66
42101017800 19.41 25766.40
42101020000 19.65 9370.00
42101000101 19.76 14330.64
42101014400 19.84 22719.23
42101017601 19.97 22499.74
42101013900 20.53 11893.19
42101000701 20.93 33310.62
42101020101 21.32 17117.37
42101009100 21.42 18026.35
42101029400 21.46 15360.37
42101017400 21.82 17969.72
42101010700 21.97 18285.44
42101013200 22.02 17949.73
42101006600 22.24 13505.86
42101014100 22.31 13739.18
42101013800 22.33 15763.50
42101006300 22.40 20581.10
42101000805 22.57 92575.12
42101015101 22.71 28989.00
42101011000 22.72 13900.65
42101029300 22.83 13460.26
42101014202 22.95 11438.46
42101000401 23.06 31951.13
42101024100 23.32 8896.72
42101017702 23.46 25018.11
42101036700 23.72 12813.19
42101020300 23.89 13067.52
42101016701 24.19 35668.68
42101000102 24.37 16883.65
42101018801 24.43 23895.47
42101002000 24.57 18106.87
42101000200 24.59 23376.36
42101016100 24.71 22414.27
42101015102 24.77 22941.91
42101003300 24.80 15304.04
42101016901 25.09 16946.58
42101014900 25.15 24675.81
42101038100 25.29 608.11
42101003100 25.36 32820.35
42101017500 25.97 24161.65
42101024600 26.04 10639.09
42101008701 26.16 33669.26
42101003200 26.49 22098.03
42101016902 26.53 18034.08
42101013701 26.66 17167.16
42101011100 26.72 8056.80
42101037800 26.74 1524.12
42101016800 26.94 17662.29
42101009200 27.30 17421.38
42101024500 27.51 15856.83
42101008602 27.52 22227.50
42101013702 27.81 33356.17
42101014300 27.83 8977.12
42101029900 27.91 19596.37
42101010500 28.00 16738.99
42101017900 28.16 24208.45
42101018802 28.32 43792.22
42101030000 28.44 23583.79
42101009400 28.55 27290.24
42101017602 28.60 26348.66
42101000902 28.80 45668.26
42101004101 28.82 34632.43
42101014201 28.96 29354.24
42101015700 29.17 13311.10
42101007700 29.41 14586.80
42101019200 29.49 34549.40
42101010300 29.64 24235.53
42101013500 29.68 26085.90
42101009500 29.71 30478.18
# Distribution of pct_homestead with new criteria
summary(tract_summary$pct_homestead[tract_summary$pop_density >= 2000 & tract_summary$total_properties >= 100])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 
# Map for low rates (under 30%)
ggplot(census_tracts_enriched) +
  geom_sf(aes(fill = ifelse(pop_density >= 2000 & total_properties >= 100 & pct_homestead < 30, 
              pct_homestead, NA))) +
  scale_fill_viridis_c(
    name = "% Homestead\nExemption\n(Under 30%)",
    na.value = "gray80",
    limits = c(0, 30),
    breaks = seq(0, 30, by = 5)
  ) +
  labs(
    title = "Low Homestead Exemption Rates in Philadelphia",
    subtitle = "Census Tracts Below 30% Enrollment",
    caption = "Gray areas: Low density (<2000/sq mi), low property count (<100), or rates ≥ 30%"
  ) +
  map_theme

# Map for high rates (over 60%)
ggplot(census_tracts_enriched) +
  geom_sf(aes(fill = ifelse(pop_density >= 2000 & total_properties >= 100 & pct_homestead > 60, 
              pct_homestead, NA))) +
  scale_fill_viridis_c(
    name = "% Homestead\nExemption\n(Over 60%)",
    na.value = "gray80",
    limits = c(60, 82),
    breaks = seq(60, 80, by = 5)
  ) +
  labs(
    title = "High Homestead Exemption Rates in Philadelphia",
    subtitle = "Census Tracts Above 60% Enrollment",
    caption = "Gray areas: Low density (<2000/sq mi), low property count (<100), or rates ≤ 60%"
  ) +
  map_theme

census_tracts_enriched <- census_tracts_enriched %>%
  mutate(
    owner_occ_rate = (census_tracts_enriched$owner_hh / census_tracts_enriched$occupied_units) * 100
  )
    
    
low_enrollment_tracts <- census_tracts_enriched %>%
  filter(pop_density >= 2000 & 
         total_properties >= 100 & 
         pct_homestead < 30 &
         owner_occ_rate > 40) %>%
  select(GEOID, pct_homestead, pop_density, total_properties) %>%
  arrange(pct_homestead)

# For the map visualization
low_enrollment_tracts_map <- ggplot(census_tracts_enriched) +
  geom_sf(aes(fill = case_when(
    pop_density >= 2000 & 
    total_properties >= 100 & 
    pct_homestead < 30 &
    owner_occ_rate > 40 ~ "#f4aa9e",
    TRUE ~ "gray80"
  ))) +
  scale_fill_identity() +
  labs(
    title = "Low Homestead Exemption Tracts",
    subtitle = "Tracts with <30% Homestead Rate & >40% Owner Occupancy",
    caption = "Gray areas do not meet filtering criteria"
  ) +
  map_theme




low_enrollment_tracts_map

#ggsave("outputs/homestead-exemption-low_enrollment_tracts.png", low_enrollment_tracts_map, width = 10, height = 6)
ggplot(census_tracts_enriched %>% 
       filter(pop_density >= 2000),
       aes(x = owner_occ_rate)) +
  geom_histogram(
    binwidth = 5,
    fill = "#e42524",
    alpha = 0.8,
    color = "white"
  ) +
  labs(
    title = "Distribution of Owner Occupancy Rates Across Philadelphia Census Tracts",
    subtitle = "Excluding Low Density Areas (<2,000 per square mile)",
    x = "Owner Occupancy Rate (%)",
    y = "Number of Census Tracts"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 18, face = "bold"),
    plot.subtitle = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12)
  ) +
  scale_x_continuous(breaks = seq(0, 100, by = 10))

Predictor Variables

Predictor Variables 1 & 2: Eligibility Characteristics

The first predictor variable is based off the key eligibility criteria of the Homestead Exemption program – whether a homeowner resides long-term in the property itself. The closest proxy of this is whether the mailing street (‘mailing_street’) address is the same as the property street address (‘location’), as the mailing address of the owner is often where the owner lives long-term.

The second predictor variable is based off the ineligibility for the homestead exemption if the homeowner is already enrolled in particular exisitng tax programs such as the LOOP and the residential tax abatement. This is determined if there is an existing tax relief that exempts a portion of the building from tax. If there is a non-zero value for ‘exempt-building’ and the property is not currently enrolled in the homestead exemption, it is potentially already enrolled in LOOP or in residential tax abatement.

residential_properties <- residential_properties %>% mutate(same_address = ifelse(mailing_street == location, 1, 0))

ggplot(residential_properties, aes(x = same_address, fill = as.factor(exemption))) +
    geom_bar(position = "dodge") +
    scale_fill_manual(values = colors, labels = c("No Exemption", "With Exemption")) +
    labs(title = "Mailing Address Matches Property Address", x = "Address Match", y = "Count", fill = "Homestead Exemption") +
    theme_minimal(base_size = 14) +
    theme(panel.grid.major = element_line(color = "grey90"),
          panel.grid.minor = element_blank(),
          legend.position = "bottom",
         plot.title = element_text(vjust = 0.5))

residential_properties <- residential_properties %>% mutate(potential_otherprog = ifelse(exempt_building > 0 & exemption == 0, 1, 0))

Predictor Variables 3 & 4: Property Characteristics

Depth of the property as well as the total property area were also predictors, although not as useful as the eligibility criteria.

residential_properties <- residential_properties %>% mutate(is_deep = ifelse(depth > 150, 1, 0)) 
ggplot(residential_properties %>% 
         filter(!is.na(is_deep)), 
       aes(x = factor(is_deep), fill = factor(exemption))) +
  geom_bar(position = "dodge") +
  scale_fill_manual(values = colors, labels = c("No Exemption", "With Exemption")) +
  labs(title = "Property Depth by Homestead Exemption", 
       x = "Depth > 300", 
       y = "Count", 
       fill = "Exemption") +
  theme_minimal(base_size = 14) +
  theme(panel.grid.major = element_line(color = "grey90"),
        panel.grid.minor = element_blank(),
        legend.position = "bottom",
        plot.title = element_text(vjust = 0.5))

residential_properties %>%
  count(is_deep, exemption) %>%
  tidyr::spread(key = exemption, value = n, fill = 0)  
##   is_deep      0      1
## 1       0 307320 234743
## 2       1   9985   8682
## 3      NA   3358    106
residential_properties %>%
  count(same_address, exemption) %>%
  tidyr::spread(key = exemption, value = n, fill = 0)  
##   same_address      0      1
## 1            0 197822  23683
## 2            1 122841 219848
residential_properties <- residential_properties %>% mutate(large_area = ifelse(total_area > 150000, 1, 0))
residential_properties %>%
  count(large_area, exemption) %>%
  tidyr::spread(key = exemption, value = n, fill = 0)  
##   large_area      0      1
## 1          0 319463 243508
## 2          1    603     17
## 3         NA    597      6

Predictor Variables 5 & 6: Potentail Commercial Activity

These two predictor variables are based on the key eligibility criteria of the Homestead Exemption program, that a property must not be used exclusively for business or rental purposes (partial use is allowed). We divided this criterion into two sections: one assessing the potential for exclusive business use and the other for exclusive rental use. A property is considered to have rental potential if it holds an active rental license, and business potential if it has an active business license (excluding rental licenses). Both variables are binary (has/does not have).

business_license<-st_read("Data/business_licenses.geojson")
## Reading layer `business_licenses' from data source 
##   `C:\Users\14735\WPSDrive\376583023\WPS云盘\0_MCP\25spring\Smart Cities Practicum\Philly-Homeowners\Data\business_licenses.geojson' 
##   using driver `GeoJSON'
## replacing null geometries with empty geometries
## Simple feature collection with 425305 features and 42 fields (with 22253 geometries empty)
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -75.27421 ymin: 39.88002 xmax: -74.95819 ymax: 40.1374
## Geodetic CRS:  WGS 84
Rental License
rental_license<-business_license%>%
  filter(licensetype=="Rental",
         #rentalcategory=="Residential Dwellings",
         licensestatus %in% c("Active"))%>%
  select(opa_account_num)%>%
  st_drop_geometry()%>%
  distinct() %>% 
  filter(!is.na(opa_account_num)) %>% 
  mutate(rental_license = 1)  
properties_rental <- properties_sf %>%
  mutate(parcel_number = as.character(parcel_number))%>%
  left_join(rental_license, by = c("parcel_number" = "opa_account_num"))%>%
  mutate(rental_license = replace_na(rental_license, 0))
ggplot(properties_rental, aes(x = factor(rental_license), fill = factor(exemption))) +
  geom_bar(position = "dodge") +  
  geom_text(stat = "count", aes(label = ..count.., color = factor(exemption)),  
            position = position_dodge(width = 0.9),  
            vjust = -0.5,  
            size = 3) +  
  scale_fill_manual(values = c("0" = "#e42524", "1" = "#00ADA9"),  
                    labels = c("0" = "No Exemption", "1" = "With Exemption")) +  
  scale_color_manual(values = c("0" = "#e42524", "1" = "#00ADA9")) +  
  labs(title = "Rental Licenses Metrics by Homestead Exemption Status",
       x = "Rental Licenses",
       y = "Count",
       fill = "Exemption Status") +
  theme_minimal() +
  theme(legend.position = "none") 

#join to dataset
residential_properties <- residential_properties %>%
  left_join(properties_rental%>%
              st_drop_geometry()%>%
              select(objectid,rental_license), 
            by = c("objectid" = "objectid"))

Business License (Exclude Rental)

commercial_license<-business_license%>%
  filter(licensestatus %in% c("Active"))%>%
  filter(licensetype %in% c(
    "Food Caterer",
    "Food Establishment, Retail Perm Location (Large)",
    "Food Establishment, Retail Permanent Location",
    "Food Manufacturer / Wholesaler",
    "Food Preparing and Serving",
    "Food Preparing and Serving (30+ SEATS)",
    "Motor Vehicle Repair / Retail Mobile Dispensing",
    "Pawn Shop",
    "Precious Metal Dealer",
    "Public Garage / Parking Lot",
    "Residential Property Wholesaler",
    "Tire Dealer",
    "Tow Company",
    "Vacant Commercial Property"
  ))%>%
  select(opa_account_num)%>%
  st_drop_geometry()%>%
  distinct() %>% 
  filter(!is.na(opa_account_num)) %>% 
  mutate(commercial_license = 1) 
properties_commercial <- properties_sf %>%
  mutate(parcel_number = as.character(parcel_number))%>%
  left_join(commercial_license, by = c("parcel_number" = "opa_account_num"))%>%
  mutate(commercial_license = replace_na(commercial_license, 0))
ggplot(properties_commercial, aes(x = factor(commercial_license), fill = factor(exemption))) +
  geom_bar(position = "dodge") +  
  geom_text(stat = "count", aes(label = ..count.., color = factor(exemption)),  
            position = position_dodge(width = 0.9),  
            vjust = -0.5,  
            size = 3) +  
  scale_fill_manual(values = c("0" = "#e42524", "1" = "#00ADA9"),  
                    labels = c("0" = "No Exemption", "1" = "With Exemption")) +  
  scale_color_manual(values = c("0" = "#e42524", "1" = "#00ADA9")) +  
  labs(title = "Commercial Licenses Metrics by Homestead Exemption Status",
       x = "Commercial Licenses (Exclude Rental)",
       y = "Count",
       fill = "Exemption Status") +
  theme_minimal() +
  theme(legend.position = "none") 

#join to dataset
residential_properties <- residential_properties %>%
  left_join(properties_commercial%>%
              st_drop_geometry()%>%
              select(objectid,commercial_license), 
            by = c("objectid" = "objectid"))

Predictor Variables 7 & 8: Tax Balance

Tax balance represents the total tax billing in a census tract, including principal, penalties, and interest from previous years. While not explicitly stated in the eligibility criteria for the Homestead Exemption, it may affect homeowners’ trust and willingness to apply, influencing the possibility of approval. Additionally, the total tax balance in a census tract can serve as an indicator of broader socioeconomic characteristics, such as income levels, educational attainment, and English proficiency, all of which may impact outreach efforts for the Homestead Exemption program.

In our model, we include two tax balance-related predictor variables: (1) the total tax balance of the census tract where a property is located, and (2) the percentage of properties that owe tax balances of the census tract where a property is located.

balances <- read.csv("Data/real_estate_tax_balances_census_tract.csv")
balance_sf <- census_tracts_enriched %>% 
  left_join(balances %>% select(census_tract,balance,num_props), 
            by = c("GEOID" = "census_tract")) 


ggplot(balance_sf) +
  geom_sf(aes(fill = balance), color = "white", size = 0.1) + 
  scale_fill_gradientn(colors = c("#00ADA9","#e3f9f7","#f4aa9e", "#e42524"),
                       limits = range(balance_sf$balance, na.rm = TRUE),
                       breaks = range(balance_sf$balance, na.rm = TRUE)
                       ) + 
  labs(title = "Total Tax Balance by Census Tract",
       fill = "Price") +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),
    plot.title = element_text(size = 14),
    plot.subtitle = element_text(size = 6),
    axis.title = element_text(size = 6),
    axis.text = element_blank(),
    legend.text = element_text(size = 6),
    legend.position = "bottom",
    legend.direction = "horizontal"
  )

# Visualize the relationship
ggplot(balance_sf, aes(x = balance, y = pct_homestead)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "loess") +
  labs(
    title = "Tax Balance vs. Homestead Participation",
    x = "Total Principle ($)",
    y = "Homestead Participation Rate (%)"
  ) +
  theme_minimal()

#distinct
balance_distinct <- balances %>%
  mutate(census = as.numeric(substr(census_tract, 6, 9)))%>%
  st_drop_geometry()%>%
  group_by(census) %>% 
  summarise(balance_avg=sum(balance,na.rm=TRUE)/sum(num_props, na.rm = TRUE),
            balance_total=sum(balance,na.rm=TRUE),
            tax_props=sum(num_props,na.rm=TRUE)) %>%
  ungroup()
properties_number<-properties_sf%>%
  select(census_tract,exemption,parcel_number)%>%
  mutate(prop=1)%>%
  st_drop_geometry()%>%
  group_by(census_tract)%>%
  summarise(total_props=sum(prop))%>%
  ungroup()

properties_sf_number<-properties_sf%>%
  left_join(properties_number,by="census_tract")
properties_balance <- properties_sf_number %>%
  select(objectid, census_tract,exemption,parcel_number,total_props)%>%
  left_join(balance_distinct, by = c("census_tract" = "census"))%>%
  mutate(balance_avg = replace(balance_avg, is.na(balance_avg), 0),
         balance_total = replace(balance_total, is.na(balance_total), 0),
         tax_props=replace(tax_props,is.na(tax_props),0))%>%
  mutate(balance_rate=tax_props/total_props)
avg_values <- properties_balance %>%
  st_drop_geometry()%>%
  group_by(exemption) %>%
  summarise(
    'Total Tax Balance (In 100,000)' = mean(balance_total, na.rm = TRUE)/100000,
    #Tax_Props=mean(tax_props,na.rm=TRUE),
    '% of Properties with Tax Balance'=mean(balance_rate,na.rm=TRUE)*100
  ) %>%
  pivot_longer(cols = -exemption, names_to = "variable", values_to = "mean_value")

ggplot(avg_values, aes(x = variable, y = mean_value, fill = as.factor(exemption))) +
  geom_bar(stat = "identity", width = 0.5, position = position_dodge(width = 0.6)) +

  geom_text(aes(label = round(mean_value, 4), color = as.factor(exemption)), 
            position = position_dodge(width = 0.6), 
            vjust = -0.5, size = 5) +

  labs(title = "Mean of Tax Variables by Exemption Status",
       x = "Metrics",
       y = "Mean Value",
       fill = "Exemption Status") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 18),
    plot.subtitle = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12),
    legend.text = element_text(size = 12),
    legend.position = "bottom",
    legend.direction = "horizontal"
  ) +
  
  scale_fill_manual(values = c("#e42524", "#00ADA9"), labels = c("No Exemption", "With Exemption")) +
  scale_color_manual(values = c("#e42524", "#00ADA9"), guide = "none") 

#join to dataset
residential_properties <- residential_properties %>%
  left_join(properties_balance%>%
              st_drop_geometry()%>%
              select(objectid,balance_total,balance_rate), 
            by = c("objectid" = "objectid"))

Predictor Variables 9 & 10: Residential Property Metrics

This sector contains these variables: - Avg. Market Value and Avg. Taxable Value Continuous (Numerical) - Sd. Market Value and Sd. Taxable Value Continuous (Numerical)

Properties with lower values are more likely to belong to homeowners who qualify for tax relief. Homestead Exemption helps stabilize values by reducing taxable assessments.

Our key findings include: - Exempted properties have lower market values and taxable values compared to non-exempt ones. - Standard deviation of market value is lower for exempted properties, suggesting more stable valuations.

# Load and inspect the assessments dataset
assessments2 <- read.csv("Data/assessments.csv")
colnames(assessments2)
## [1] "parcel_number"    "year"             "market_value"     "taxable_land"    
## [5] "taxable_building" "exempt_land"      "exempt_building"  "objectid"
head(assessments2, 10)
##    parcel_number year market_value taxable_land taxable_building exempt_land
## 1       11001000 2020       237000        62094           129906           0
## 2       11001000 2019       218700        57299           121401           0
## 3       11001000 2018       192200        50356           111844           0
## 4       11001000 2017       192200        50356           111844           0
## 5       11001000 2016       192200        30150           132050           0
## 6       11001000 2015       192200        30150           132050           0
## 7       11001100 2025       381300        76260           305040           0
## 8       11001100 2024       339800        67960           271840           0
## 9       11001100 2023       339800        67960           271840           0
## 10      11001100 2022       282300        73963           208337           0
##    exempt_building   objectid
## 1            45000 2840898049
## 2            40000 2840898050
## 3            30000 2840898051
## 4            30000 2840898052
## 5            30000 2840898053
## 6            30000 2840898054
## 7                0 2840898055
## 8                0 2840898056
## 9                0 2840898057
## 10               0 2840898058
# Merge properties and assessments data by parcel_number
cleaned_properties <- filtered_properties %>%
  select(parcel_number, exemption, is_residential, shape)
assessment_combined <- assessments2 %>%
  left_join(cleaned_properties, by = "parcel_number")

head(assessment_combined)
##   parcel_number year market_value taxable_land taxable_building exempt_land
## 1      11001000 2020       237000        62094           129906           0
## 2      11001000 2019       218700        57299           121401           0
## 3      11001000 2018       192200        50356           111844           0
## 4      11001000 2017       192200        50356           111844           0
## 5      11001000 2016       192200        30150           132050           0
## 6      11001000 2015       192200        30150           132050           0
##   exempt_building   objectid exemption is_residential
## 1           45000 2840898049         1              1
## 2           40000 2840898050         1              1
## 3           30000 2840898051         1              1
## 4           30000 2840898052         1              1
## 5           30000 2840898053         1              1
## 6           30000 2840898054         1              1
##                                                  shape
## 1 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 2 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 3 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 4 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 5 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 6 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
# Filter Residential Data
# Filter the combined dataset to include only residential properties
residential_assessment_combined <- assessment_combined %>%
  filter(is_residential == 1)
head(residential_assessment_combined)
##   parcel_number year market_value taxable_land taxable_building exempt_land
## 1      11001000 2020       237000        62094           129906           0
## 2      11001000 2019       218700        57299           121401           0
## 3      11001000 2018       192200        50356           111844           0
## 4      11001000 2017       192200        50356           111844           0
## 5      11001000 2016       192200        30150           132050           0
## 6      11001000 2015       192200        30150           132050           0
##   exempt_building   objectid exemption is_residential
## 1           45000 2840898049         1              1
## 2           40000 2840898050         1              1
## 3           30000 2840898051         1              1
## 4           30000 2840898052         1              1
## 5           30000 2840898053         1              1
## 6           30000 2840898054         1              1
##                                                  shape
## 1 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 2 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 3 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 4 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 5 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
## 6 SRID=2272;POINT  ( 2698365.44997206 228564.73242714)
colnames(residential_assessment_combined)
##  [1] "parcel_number"    "year"             "market_value"     "taxable_land"    
##  [5] "taxable_building" "exempt_land"      "exempt_building"  "objectid"        
##  [9] "exemption"        "is_residential"   "shape"
# Market Value Growth Rate by Exemption Status (2016-2025)

# Calculate yearly market value and growth rate by exemption status
yearly_market_value <- residential_assessment_combined %>%
  group_by(year, exemption) %>%
  summarise(total_market_value = sum(market_value, na.rm = TRUE), .groups = 'drop')


yearly_market_value <- yearly_market_value %>%
  arrange(exemption, year) %>%
  group_by(exemption) %>%
  mutate(market_value_growth_rate = (total_market_value - lag(total_market_value)) / lag(total_market_value) * 100)

# Filter data for the years 2016-2025
yearly_market_value_filtered <- yearly_market_value %>%
  filter(year >= 2016 & year <= 2025)

# Plot the market value growth rate
ggplot(yearly_market_value_filtered, aes(x = year, y = market_value_growth_rate, color = as.factor(exemption))) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  scale_x_continuous(breaks = seq(2016, 2025, by = 1)) + 
  scale_y_continuous(labels = percent_format(scale = 1)) + 
  scale_color_manual(values = c("0" = "#E42524", "1" = "#00ADA9"), labels = c("No Exemption", "With Exemption")) +
  labs(title = "Market Value Growth Rate (2016-2025)", 
       x = "Year", 
       y = "Growth Rate (%)", 
       color = "Exemption Status") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

print(yearly_market_value_filtered)
## # A tibble: 20 × 4
## # Groups:   exemption [2]
##     year exemption total_market_value market_value_growth_rate
##    <int>     <dbl>              <dbl>                    <dbl>
##  1  2016         0        68303720915                  1.60   
##  2  2017         0        70567844512                  3.31   
##  3  2018         0        80287997319                 13.8    
##  4  2019         0        88179071992                  9.83   
##  5  2020         0        93001545386                  5.47   
##  6  2021         0        94757821078                  1.89   
##  7  2022         0        96509983650                  1.85   
##  8  2023         0       114634228622                 18.8    
##  9  2024         0       116744008982                  1.84   
## 10  2025         0       129886267715                 11.3    
## 11  2016         1        38156905833                  0.708  
## 12  2017         1        38443943033                  0.752  
## 13  2018         1        38667839100                  0.582  
## 14  2019         1        43364362090                 12.1    
## 15  2020         1        44984523550                  3.74   
## 16  2021         1        44981876450                 -0.00588
## 17  2022         1        45009628972                  0.0617 
## 18  2023         1        56848412909                 26.3    
## 19  2024         1        56869572229                  0.0372 
## 20  2025         1        68185163435                 19.9
# Calculate average growth rate by exemption status
growth_comparison <- yearly_market_value_filtered %>%
  group_by(exemption) %>%
  summarise(market_value_growth_rate = mean(market_value_growth_rate, na.rm = TRUE))

print(growth_comparison)
## # A tibble: 2 × 2
##   exemption market_value_growth_rate
##       <dbl>                    <dbl>
## 1         0                     6.96
## 2         1                     6.42

Mean values for market_value, taxable_land, taxable_building, exempt_land, and exempt_building. Growth rate (average annual growth). Standard deviation (to measure volatility).

# Summarize key metrics for each parcel: mean, growth rate, and standard deviation
residential_summary <- residential_assessment_combined %>%
  group_by(parcel_number) %>%
  summarise(
    avg_market_value = mean(market_value, na.rm = TRUE),
    avg_taxable_land = mean(taxable_land, na.rm = TRUE),
    avg_taxable_building = mean(taxable_building, na.rm = TRUE),
    avg_exempt_land = mean(exempt_land, na.rm = TRUE),
    avg_exempt_building = mean(exempt_building, na.rm = TRUE),
    growth_market_value = (last(market_value) - first(market_value)) / first(market_value) * 100,
    sd_market_value = sd(market_value, na.rm = TRUE),
    sd_taxable_land = sd(taxable_land, na.rm = TRUE),
    sd_taxable_building = sd(taxable_building, na.rm = TRUE)
  )
# Add exemption and residential status to the summary dataset
residential_summary <- residential_summary %>%
  left_join(
    residential_assessment_combined %>%
      select(parcel_number, exemption, is_residential, shape) %>%
      distinct(parcel_number, .keep_all = TRUE),  # Keep one row per parcel_number
    by = "parcel_number"
  )

head(residential_summary)
## # A tibble: 6 × 13
##   parcel_number avg_market_value avg_taxable_land avg_taxable_building
##           <dbl>            <dbl>            <dbl>                <dbl>
## 1      11000001          112700           112700                     0
## 2      11000002          106600           106600                     0
## 3      11000003          106600           106600                     0
## 4      11000004           90333.           90333.                    0
## 5      11000005          106600           106600                     0
## 6      11000006          144533.          144533.                    0
## # ℹ 9 more variables: avg_exempt_land <dbl>, avg_exempt_building <dbl>,
## #   growth_market_value <dbl>, sd_market_value <dbl>, sd_taxable_land <dbl>,
## #   sd_taxable_building <dbl>, exemption <dbl>, is_residential <dbl>,
## #   shape <chr>
# Analysis by Exemption Status
# Summarize metrics by exemption status
residential_summary_analysis <- residential_summary %>%
  group_by(exemption) %>%
  summarise(
    avg_market_value = mean(avg_market_value, na.rm = TRUE),
    avg_taxable_land = mean(avg_taxable_land, na.rm = TRUE),
    avg_taxable_building = mean(avg_taxable_building, na.rm = TRUE),
    avg_exempt_land = mean(avg_exempt_land, na.rm = TRUE),
    avg_exempt_building = mean(avg_exempt_building, na.rm = TRUE),
    sd_market_value = mean(sd_market_value, na.rm = TRUE),
    sd_taxable_land = mean(sd_taxable_land, na.rm = TRUE),
    sd_taxable_building = mean(sd_taxable_building, na.rm = TRUE)
  ) 

# Convert the summary data to long format for visualization
library(tidyr)
summary_residential_long <- residential_summary_analysis %>%
  pivot_longer(
    cols = -exemption, 
    names_to = "metric", 
    values_to = "value"
  )
# Plot residential property metrics by exemption status
library(ggplot2)
custom_colors <- c("0" = "#E42524",  
                   "1" = "#00ADA9")  

ggplot(summary_residential_long, aes(x = metric, y = value, fill = as.factor(exemption))) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = custom_colors, labels = c("No Exemption", "With Exemption")) +
  labs(title = "Residential Property Metrics by Homestead Exemption Status (2015-2025)",
       x = "Metric", 
       y = "Value", 
       fill = "Exemption Status") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

  • Exemptions are more common in lower-value, owner-occupied homes, which tend to have less price volatility.
  • Higher-value homes may be more influenced by market forces, leading to greater fluctuations.
#Plot Only "avg_market_value" and "sd_market_value"

summary_filtered <- summary_residential_long %>%
  filter(metric %in% c("avg_market_value", "sd_market_value"))

custom_colors <- c("0" = "#E42524",  
                   "1" = "#00ADA9") 

ggplot(summary_filtered, aes(x = metric, y = value, fill = as.factor(exemption), color = as.factor(exemption))) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.7), width = 0.6, alpha = 0.85) + 
  geom_text(aes(label = round(value, 0)), 
            position = position_dodge(width = 0.7), 
            vjust = -0.5, size = 4.5, fontface = "bold") + 
  ylim(0, max(summary_filtered$value) * 1.2) +
  scale_fill_manual(values = custom_colors, labels = c("No Exemption", "With Exemption")) +
  scale_color_manual(values = custom_colors, guide = "none") +  
  labs(title = "Average and Standard Deviation of Market Value (2015-2025)",
       subtitle = "Comparison of Properties With and Without Exemption",
       x = "Metric", 
       y = "Value ($)", 
       fill = "Exemption Status") +
  theme_minimal(base_size = 14) +
  theme(axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 0.5, face = "bold"),
        axis.title = element_text(face = "bold"),
        legend.position = "top",
        plot.title = element_text(face = "bold", size = 16),
        plot.subtitle = element_text(size = 13, color = "gray40"))

Predictor Variables 11 & 12 & 13: Property Transfer Types and Exemption Correlation

Key Findings: - Certain deed types (e.g., Deed - Deceased, Satisfaction of Mortgage, Land Bank Deed) have a higher exemption rate. Possible Reason: - These transactions indicate long-term ownership, inheritance, or financial restructuring, which may qualify properties for exemption. Predictive Data to Use: - Property deed type Categorical (Nominal)

transfers2 <- read.csv("Data/RTT_SUMMARY.csv")
residential_transfers <- residential_properties %>%
  left_join(transfers2, by = c("parcel_number" = "opa_account_num"))
# Deed Type Analysis
residential_transfers <- residential_transfers %>%
  mutate(document_type = ifelse(is.na(document_type), "Unknown", document_type))
exemption_by_document_type <- residential_transfers %>%
  group_by(document_type) %>%
  summarise(
    total_count = n(),
    exemption_count = sum(exemption == 1),
    exemption_proportion = exemption_count / total_count * 100
  ) %>%
  arrange(desc(exemption_proportion))

print(exemption_by_document_type)
## # A tibble: 30 × 4
##    document_type                total_count exemption_count exemption_proportion
##    <chr>                              <int>           <int>                <dbl>
##  1 "NOTARY COMMISSION"                    1               1                100  
##  2 "DEED - DECEASED "                   217             145                 66.8
##  3 "SATISFACTION OF MORTGAGE"        199081           99458                 50.0
##  4 "MORTGAGE"                        217667          100540                 46.2
##  5 "Unknown"                         302682          129494                 42.8
##  6 "ALL OTHER MISCELLANEOUS IN…          75              32                 42.7
##  7 "ASSIGNMENT OF MORTGAGE"           57498           23583                 41.0
##  8 "POWER OF ATTORNEY"                 2859            1042                 36.4
##  9 "DECLARATION OF PLANNED COM…         230              78                 33.9
## 10 "DEED - ADVERSE POSSESSION"            3               1                 33.3
## # ℹ 20 more rows
## binomial model:DEED - DECEASED, SATISFACTION OF MORTGAGE, MORTGAGE, ASSIGNMENT OF MORTGAGE
document_exemption_model <- glm(exemption ~ document_type, data = residential_transfers, family = binomial)

summary(document_exemption_model)
## 
## Call:
## glm(formula = exemption ~ document_type, family = binomial, data = residential_transfers)
## 
## Coefficients:
##                                                            Estimate Std. Error
## (Intercept)                                                -1.48638    0.25404
## document_typeALL OTHER MISCELLANEOUS INSTRUMENTS            1.19091    0.34502
## document_typeAMENDMENT                                     -0.77240    0.28647
## document_typeAMENDMENT TO DECLARATION OF CONDOMINIUM       -0.19468    0.26273
## document_typeAMENDMENT TO DECLARATION OF PLANNED COMMUNITY -4.39121    1.02484
## document_typeASSIGNMENT                                    -2.63044    0.33276
## document_typeASSIGNMENT OF MORTGAGE                         1.12305    0.25418
## document_typeCERTIFICATE OF STOCK TRANSFER                 -2.85526    0.48309
## document_typeCONTINUATION                                  -0.82003    0.26309
## document_typeDECLARATION OF CONDOMINIUM                    -3.82189    0.51497
## document_typeDECLARATION OF PLANNED COMMUNITY               0.81921    0.28972
## document_typeDEED                                           0.76405    0.25409
## document_typeDEED - ADVERSE POSSESSION                      0.79323    1.25081
## document_typeDEED - DECEASED                                2.18645    0.29210
## document_typeDEED LAND BANK                                -3.49996    0.51560
## document_typeDEED OF CONDEMNATION                          -1.15268    0.44549
## document_typeDEED RTT - OTHER                              -0.10552    0.34185
## document_typeDM - LIS PENDENS                              -0.09554    0.27046
## document_typeMISCELLANEOUS DEED                            -0.21891    0.25429
## document_typeMISCELLANEOUS DEED TAXABLE                    -3.29275    0.75412
## document_typeMORTGAGE                                       1.33367    0.25408
## document_typeNOTARY COMMISSION                             10.05214   43.95469
## document_typeORIGINAL FINANCING STATEMENT                  -0.04482    0.25502
## document_typePOWER OF ATTORNEY                              0.93033    0.25699
## document_typeRELEASE                                       -0.86500    0.78240
## document_typeRELEASE OF MORTGAGE                           -0.26764    0.25673
## document_typeSATISFACTION OF MORTGAGE                       1.48472    0.25408
## document_typeSHERIFF'S DEED                                -0.87434    0.26192
## document_typeTERMINATION                                   -0.21761    0.25728
## document_typeUnknown                                        1.19563    0.25407
##                                                            z value
## (Intercept)                                                 -5.851
## document_typeALL OTHER MISCELLANEOUS INSTRUMENTS             3.452
## document_typeAMENDMENT                                      -2.696
## document_typeAMENDMENT TO DECLARATION OF CONDOMINIUM        -0.741
## document_typeAMENDMENT TO DECLARATION OF PLANNED COMMUNITY  -4.285
## document_typeASSIGNMENT                                     -7.905
## document_typeASSIGNMENT OF MORTGAGE                          4.418
## document_typeCERTIFICATE OF STOCK TRANSFER                  -5.910
## document_typeCONTINUATION                                   -3.117
## document_typeDECLARATION OF CONDOMINIUM                     -7.422
## document_typeDECLARATION OF PLANNED COMMUNITY                2.828
## document_typeDEED                                            3.007
## document_typeDEED - ADVERSE POSSESSION                       0.634
## document_typeDEED - DECEASED                                 7.485
## document_typeDEED LAND BANK                                 -6.788
## document_typeDEED OF CONDEMNATION                           -2.587
## document_typeDEED RTT - OTHER                               -0.309
## document_typeDM - LIS PENDENS                               -0.353
## document_typeMISCELLANEOUS DEED                             -0.861
## document_typeMISCELLANEOUS DEED TAXABLE                     -4.366
## document_typeMORTGAGE                                        5.249
## document_typeNOTARY COMMISSION                               0.229
## document_typeORIGINAL FINANCING STATEMENT                   -0.176
## document_typePOWER OF ATTORNEY                               3.620
## document_typeRELEASE                                        -1.106
## document_typeRELEASE OF MORTGAGE                            -1.043
## document_typeSATISFACTION OF MORTGAGE                        5.844
## document_typeSHERIFF'S DEED                                 -3.338
## document_typeTERMINATION                                    -0.846
## document_typeUnknown                                         4.706
##                                                                       Pr(>|z|)
## (Intercept)                                                0.00000000488747010
## document_typeALL OTHER MISCELLANEOUS INSTRUMENTS                      0.000557
## document_typeAMENDMENT                                                0.007013
## document_typeAMENDMENT TO DECLARATION OF CONDOMINIUM                  0.458718
## document_typeAMENDMENT TO DECLARATION OF PLANNED COMMUNITY 0.00001829077280808
## document_typeASSIGNMENT                                    0.00000000000000268
## document_typeASSIGNMENT OF MORTGAGE                        0.00000994866804138
## document_typeCERTIFICATE OF STOCK TRANSFER                 0.00000000341148639
## document_typeCONTINUATION                                             0.001827
## document_typeDECLARATION OF CONDOMINIUM                    0.00000000000011571
## document_typeDECLARATION OF PLANNED COMMUNITY                         0.004690
## document_typeDEED                                                     0.002639
## document_typeDEED - ADVERSE POSSESSION                                0.525969
## document_typeDEED - DECEASED                               0.00000000000007139
## document_typeDEED LAND BANK                                0.00000000001136266
## document_typeDEED OF CONDEMNATION                                     0.009670
## document_typeDEED RTT - OTHER                                         0.757582
## document_typeDM - LIS PENDENS                                         0.723894
## document_typeMISCELLANEOUS DEED                                       0.389303
## document_typeMISCELLANEOUS DEED TAXABLE                    0.00001263599788511
## document_typeMORTGAGE                                      0.00000015283707518
## document_typeNOTARY COMMISSION                                        0.819107
## document_typeORIGINAL FINANCING STATEMENT                             0.860491
## document_typePOWER OF ATTORNEY                                        0.000295
## document_typeRELEASE                                                  0.268915
## document_typeRELEASE OF MORTGAGE                                      0.297172
## document_typeSATISFACTION OF MORTGAGE                      0.00000000511078389
## document_typeSHERIFF'S DEED                                           0.000843
## document_typeTERMINATION                                              0.397663
## document_typeUnknown                                       0.00000252637981775
##                                                               
## (Intercept)                                                ***
## document_typeALL OTHER MISCELLANEOUS INSTRUMENTS           ***
## document_typeAMENDMENT                                     ** 
## document_typeAMENDMENT TO DECLARATION OF CONDOMINIUM          
## document_typeAMENDMENT TO DECLARATION OF PLANNED COMMUNITY ***
## document_typeASSIGNMENT                                    ***
## document_typeASSIGNMENT OF MORTGAGE                        ***
## document_typeCERTIFICATE OF STOCK TRANSFER                 ***
## document_typeCONTINUATION                                  ** 
## document_typeDECLARATION OF CONDOMINIUM                    ***
## document_typeDECLARATION OF PLANNED COMMUNITY              ** 
## document_typeDEED                                          ** 
## document_typeDEED - ADVERSE POSSESSION                        
## document_typeDEED - DECEASED                               ***
## document_typeDEED LAND BANK                                ***
## document_typeDEED OF CONDEMNATION                          ** 
## document_typeDEED RTT - OTHER                                 
## document_typeDM - LIS PENDENS                                 
## document_typeMISCELLANEOUS DEED                               
## document_typeMISCELLANEOUS DEED TAXABLE                    ***
## document_typeMORTGAGE                                      ***
## document_typeNOTARY COMMISSION                                
## document_typeORIGINAL FINANCING STATEMENT                     
## document_typePOWER OF ATTORNEY                             ***
## document_typeRELEASE                                          
## document_typeRELEASE OF MORTGAGE                              
## document_typeSATISFACTION OF MORTGAGE                      ***
## document_typeSHERIFF'S DEED                                ***
## document_typeTERMINATION                                      
## document_typeUnknown                                       ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1408886  on 1043658  degrees of freedom
## Residual deviance: 1360982  on 1043629  degrees of freedom
## AIC: 1361042
## 
## Number of Fisher Scoring iterations: 7

Predictor Variables 11 & 12 & 13: Minimal Financial Consideration Transfers

Key Findings: - A significant proportion (39.7%) of exempt properties involved minimal financial consideration transfers (≤ $10). Possible Reason: - These transfers often occur in family transactions, estate planning, or financial hardship cases, aligning with exemption criteria. Predictive Data to Use: - Total transaction value Continuous (Numerical) - if the properties have minimal financial consideration transfers Binomial (0 = No, 1 = Yes)

# Minimal Financial Consideration Analysis

minimal_threshold <- 10 

minimal_financial_transfers <- residential_transfers %>%
  filter(total_consideration <= minimal_threshold)
exemption_summary <- minimal_financial_transfers %>%
  group_by(exemption) %>%
  summarise(
    count = n(),
    proportion = count / nrow(minimal_financial_transfers) * 100
  )

print(exemption_summary)
## # A tibble: 2 × 3
##   exemption  count proportion
##       <dbl>  <int>      <dbl>
## 1         0 279647       59.2
## 2         1 192950       40.8
#Recent(2y) Transfer Analysis
library(dplyr)
library(lubridate)

# Convert recording_date to Date format
residential_transfers <- residential_transfers %>%
  mutate(recording_date = as.Date(recording_date.x, format="%Y-%m-%d"))

# Get the current year
current_year <- year(Sys.Date())
# Filter and summarize recent transfers
residential_transfers_2y <- residential_transfers %>%
  filter(!is.na(recording_date)) %>%
  group_by(parcel_number) %>%
  summarise(
    latest_transfer_year = max(year(recording_date), na.rm = TRUE),  # Most recent transfer year
    exemption = first(exemption),  # Retain exemption status
    .groups = "drop"
  ) %>%
  mutate(
    latest_transfer_year = ifelse(is.infinite(latest_transfer_year), NA, latest_transfer_year)  # Handle infinite values
  )

# Mark properties with recent transfers (within 2 years)
residential_transfers_2y <- residential_transfers_2y %>%
  mutate(has_recent_transfer = ifelse(!is.na(latest_transfer_year) & latest_transfer_year >= (current_year - 2), 1, 0))

# View results
print(head(residential_transfers_2y))
## # A tibble: 6 × 4
##   parcel_number latest_transfer_year exemption has_recent_transfer
##           <dbl>                <int>     <dbl>               <dbl>
## 1      11000001                 2021         0                   0
## 2      11000002                 2021         0                   0
## 3      11000003                 2021         0                   0
## 4      11000004                 2021         0                   0
## 5      11000006                 2021         0                   0
## 6      11000010                 2021         0                   0
# Calculate recent transfer rates by exemption status
exemption_transfer_analysis_residential <- residential_transfers_2y %>%
  group_by(exemption) %>%
  summarise(
    avg_recent_transfer = mean(has_recent_transfer, na.rm = TRUE) * 100  # Convert to percentage
  )

# View results
print(exemption_transfer_analysis_residential)
## # A tibble: 2 × 2
##   exemption avg_recent_transfer
##       <dbl>               <dbl>
## 1         0                9.52
## 2         1                4.24

On average, 20.07% of properties without a homestead exemption had a transfer in the past two years, compared to 16.17% of those with an exemption. This indicates that properties without exemptions tend to have a slightly higher likelihood of recent transactions.

# Create a contingency table for exemption and recent transfers
exemption_transfer_table <- table(residential_transfers_2y$exemption, residential_transfers_2y$has_recent_transfer)

# Perform chi-square test
chi_test_result <- chisq.test(exemption_transfer_table)

# View test results
print(chi_test_result)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  exemption_transfer_table
## X-squared = 5721.6, df = 1, p-value < 0.00000000000000022

Predictor Variables 14 & 15 & 16: Census Data

Median Home Value
# Scatterplot of homestead rates vs median home values
ggplot(census_tracts_enriched %>% 
       filter(pop_density > 0), 
       aes(x = median_home_value, y = pct_homestead)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "loess", se = TRUE) +
  scale_x_continuous(labels = scales::dollar_format()) +
  theme_minimal() +
  labs(
    title = "Homestead Exemption Rates vs. Home Values",
    x = "Median Home Value",
    y = "Percentage with Homestead Exemption"
  )

#use liner model =lm
# Create the linear model
homestead_model <- lm(pct_homestead ~ median_home_value, 
                     data = census_tracts_enriched %>% 
                     filter(pop_density > 0))

# View the summary statistics
summary(homestead_model)
## 
## Call:
## lm(formula = pct_homestead ~ median_home_value, data = census_tracts_enriched %>% 
##     filter(pop_density > 0))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.514 -13.480  -0.905  11.873  42.103 
## 
## Coefficients:
##                       Estimate   Std. Error t value            Pr(>|t|)    
## (Intercept)       42.793398855  1.606096340  26.644 <0.0000000000000002 ***
## median_home_value  0.000003141  0.000005130   0.612               0.541    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.05 on 374 degrees of freedom
##   (15 observations deleted due to missingness)
## Multiple R-squared:  0.001001,   Adjusted R-squared:  -0.00167 
## F-statistic: 0.3749 on 1 and 374 DF,  p-value: 0.5407
# Visualize with linear fit instead of loess
ggplot(census_tracts_enriched %>% 
       filter(pop_density > 0), 
       aes(x = median_home_value, y = pct_homestead)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE) +  # Changed from loess to lm
  scale_x_continuous(labels = scales::dollar_format()) +
  theme_minimal() +
  labs(
    title = "Linear Relationship: Homestead Exemption Rates vs. Home Values",
    x = "Median Home Value",
    y = "Percentage with Homestead Exemption"
  )

The scatterplot shows the relationship between median home values (x-axis) and homestead exemption rates (y-axis) across Philadelphia census tracts. The pattern suggests:

Homestead exemption rates increase with home values up to around $250,000.

Peak participation occurs in the $200,000-$300,000 range (around 50%).

There’s a slight decline in participation for higher-value homes.

The widening gray area at higher home values indicates more uncertainty in the trend, likely due to fewer data points in that range.

Wide variation in participation rates across all home values, shown by the vertical spread of points

Owner Occupancy
# Owner Occupancy vs Homestead Rates Analysis
occupancy_analysis <- census_tracts_enriched %>%
  filter(pop_density > 0) %>%
  mutate(
    owner_occ_rate = (owner_hh / occupied_units) * 100,
    pct_homestead_owners = (homestead_count / owner_hh) * 100
  ) %>%
  select(GEOID, owner_occ_rate, pct_homestead_owners)

# Visualize the relationship
ggplot(occupancy_analysis, aes(x = owner_occ_rate, y = pct_homestead_owners)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "loess") +
  labs(
    title = "Owner Occupancy Rate vs. Homestead Participation",
    x = "Owner Occupancy Rate (%)",
    y = "Homestead Participation Rate (%)"
  ) +
  theme_minimal()

ggsave("outputs/homestead-exemption-distribution.png", census_hist, width = 10, height = 6)

This scatter plot reveals important patterns in homestead exemption participation across Philadelphia’s neighborhoods. By comparing census tract data on owner occupancy rates (from the 2022 5-year ACS) with homestead exemption enrollment, we can identify areas where participation could be improved.

The data shows that while most Philadelphia census tracts have owner occupancy rates between 25-75%, and generally over half of eligible homeowners participate in the program, there are clear opportunities for improvement. Particularly concerning are:

Census tracts with participation rates below 50%

Areas with high owner occupancy but low program participation

Neighborhoods falling well below the expected participation rate (shown by the blue trend line)

Notably, higher rates of owner occupancy don’t automatically translate to higher program participation. This suggests that other factors beyond home ownership - such as awareness of the program, ease of enrollment, or demographic characteristics - may play more significant roles in determining participation rates. These insights can help guide targeted outreach efforts to increase program enrollment among eligible homeowners who are currently missing out on this tax benefit.

# Create the dataset with the calculated rates
occupancy_analysis <- census_tracts_enriched %>%
  filter(pop_density > 0) %>%
  mutate(
    owner_occ_rate = (owner_hh / occupied_units) * 100,
    pct_homestead_owners = (homestead_count / owner_hh) * 100
  ) %>%
  select(GEOID, owner_occ_rate, pct_homestead_owners)

occupancy_analysis <- occupancy_analysis %>%
  filter(!is.na(pct_homestead_owners) & !is.nan(pct_homestead_owners) & !is.infinite(pct_homestead_owners))

# Fit the linear model
occupancy_model <- lm(pct_homestead_owners ~ owner_occ_rate, data = occupancy_analysis)

# View model summary
summary(occupancy_model)
## 
## Call:
## lm(formula = pct_homestead_owners ~ owner_occ_rate, data = occupancy_analysis)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -69.127 -11.116   0.669  11.707 126.499 
## 
## Coefficients:
##                Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)    77.27015    3.00895  25.680 <0.0000000000000002 ***
## owner_occ_rate -0.11459    0.05411  -2.118              0.0348 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.61 on 381 degrees of freedom
## Multiple R-squared:  0.01164,    Adjusted R-squared:  0.009042 
## F-statistic: 4.485 on 1 and 381 DF,  p-value: 0.03483
# Visualize with linear fit
ggplot(occupancy_analysis, aes(x = owner_occ_rate, y = pct_homestead_owners)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE) +  # Changed from loess to lm
  labs(
    title = "Linear Relationship: Owner Occupancy vs. Homestead Participation",
    x = "Owner Occupancy Rate (%)",
    y = "Homestead Participation Rate (%)"
  ) +
  theme_minimal()

Zoning Analysis
# Data frame of zoning types
zoning_types <- data.frame(
  Type = c(
    "Single Family Detached",
    "Single Family Attached",
    "Two-Family Attached",
    "Multi-Family",
    "Residential Mixed-Use",
    "Commercial Mixed-Use",
    "Industrial Residential Mixed-Use"
  ),
  Codes = c(
    "RSD1, RSD2, RSD3",
    "RSA1, RSA2, RSA3, RSA4, RSA5, RSA6",
    "RTA1",
    "RM1, RM2, RM3, RM4",
    "RMX1, RMX2, RMX3",
    "CMX1, CMX2, CMX2.5, CMX3, CMX4, CMX5",
    "IRMX"
  ),
  Description = c(
    "Detached houses on individual lots",
    "Attached and semi-detached houses on individual lots",
    "Two-family, semi-detached houses on individual lots",
    "Moderate to high-density multi-unit residential buildings",
    "Residential and mixed-use development, including master plan development",
    "Neighborhood to regional-serving mixed-use development",
    "Mix of low-impact industrial, artisan industrial, residential, and neighborhood commercial uses"
  )
)

# Formatted table
kable(zoning_types,
      col.names = c("Residential Type", "Zoning Codes", "Description"),
      caption = "Philadelphia Residential Zoning Classifications")
Philadelphia Residential Zoning Classifications
Residential Type Zoning Codes Description
Single Family Detached RSD1, RSD2, RSD3 Detached houses on individual lots
Single Family Attached RSA1, RSA2, RSA3, RSA4, RSA5, RSA6 Attached and semi-detached houses on individual lots
Two-Family Attached RTA1 Two-family, semi-detached houses on individual lots
Multi-Family RM1, RM2, RM3, RM4 Moderate to high-density multi-unit residential buildings
Residential Mixed-Use RMX1, RMX2, RMX3 Residential and mixed-use development, including master plan development
Commercial Mixed-Use CMX1, CMX2, CMX2.5, CMX3, CMX4, CMX5 Neighborhood to regional-serving mixed-use development
Industrial Residential Mixed-Use IRMX Mix of low-impact industrial, artisan industrial, residential, and neighborhood commercial uses
properties_filtered <- filtered_properties %>%
  select(
    zoning,
    homestead_exemption,
    is_residential,
    census_tract,
    shape
  )

# First create a zoning type classification
filtered_properties <- filtered_properties %>% 
  mutate(
    zoning_type = case_when(
      zoning %in% c("RSD1", "RSD2", "RSD3") ~ "Single Family Detached",
      zoning %in% c("RSA1", "RSA2", "RSA3", "RSA4", "RSA5", "RSA6") ~ "Single Family Attached",
      zoning %in% c("RTA1") ~ "Two-Family Attached",
      zoning %in% c("RM1", "RM2", "RM3", "RM4") ~ "Multi-Family",
      zoning %in% c("RMX1", "RMX2", "RMX3") ~ "Residential Mixed-Use",
      zoning %in% c("CMX1", "CMX2", "CMX2.5", "CMX3", "CMX4", "CMX5") ~ "Commercial Mixed-Use",
      zoning %in% c("IRMX") ~ "Industrial Residential Mixed-Use",
      TRUE ~ "Other"
    ),
    is_residential = ifelse(zoning_type != "Other", 1, 0)
  )


zoning_summary <- filtered_properties %>%
  filter(is_residential == 1) %>%
  group_by(zoning_type) %>%
  summarise(
    total_properties = n(),
    homestead_count = sum(homestead_exemption > 0, na.rm = TRUE),
    pct_homestead = (homestead_count / total_properties) * 100
  ) %>%
  arrange(desc(pct_homestead))


# Homestead rates by zoning type
zoning_summary_chart <- ggplot(zoning_summary, 
       aes(x = reorder(zoning_type, -pct_homestead), 
           y = pct_homestead)) +
  geom_bar(stat = "identity", 
           fill = "#e42524",
           alpha = 0.8) +
  geom_text(aes(label = round(pct_homestead,1)), 
            vjust = -0.5,
            size = 3) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.grid.major.x = element_blank(),
    plot.title = element_text(size = 18, face = "bold"),
    plot.subtitle = element_text(size = 14),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 14),
    legend.text = element_text(size = 14)
  ) +
  labs(
    title = "Homestead Exemption Rates by Zoning Category in Philadelphia",
    subtitle = "Residential and Mixed-Use Districts Only",
    x = "Zoning Category",
    y = "Percentage with Homestead Exemption",
    caption = "Source: Philadelphia Property Data, 2025"
  ) +
  scale_y_continuous(
    limits = c(0, max(zoning_summary$pct_homestead) * 1.1),
    labels = function(x) paste0(x, "%")
  )



zoning_summary_chart

ggsave("outputs/homestead-exemption-zoning_summary_chart.png", zoning_summary_chart, width = 10, height = 6)

Modeling